Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Besselj and Besselk for all integer and noninteger postive orders #29

Merged
merged 14 commits into from
Aug 4, 2022

Conversation

heltonmc
Copy link
Member

@heltonmc heltonmc commented Aug 3, 2022

Implementation of besselk and besseli for all real orders. Still have several things to finish and touch up but these methods so far seem to work

besseli is very straightforward. Use the uniform asymptotic expansion when either x or nu is large (x>35, nu>25) and then the power series elsewhere... now I also have the asymptotic expansion when x is large in there that is significantly faster but may just remove so the whole function has just two branches... decision point for later..

besselk is obviously not straightforward. However, the same uniform asymptotic expansion applies when either x or nu is large (x>35, nu>25) so use that. Now we could also use an asymptotic expansion with respect to x here as well (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1197096678), but again at the cost of an extra branch.

So the rest comes down to what do we do when both x < 35 and nu < 25... The power series form for besselk is accurate when x < 2 or nu > 1.6x - 1.0 so we could use that but that doesn't help much for x > 2 and adds an extra branch. We could use numerical integration here from small orders and then use forward recurrence (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1177801831) and subsequent optimizations.... We could also use the hypergeometric series (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1196836506) for small orders then use a continued fraction approach to generate k_[nu+1} and then use forward recurrence to fill everything else..... we could also use some type of polynomial approximation (Chebyshev) but I found that this only worked well for a limited range.

I think the simplest method is to take advantage of the fact that the besseli power series is very accurate for nu<25 for any x value.... the advantage is we can also generate I_{nu+1} very easily so we can get very accurate estimates of I_{nu+1}/I_{nu} for any value of nu in this range. Now, we can also use a continued fraction approach to get K_{nu+1}/K_{nu} (see https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1195726642). So using the two ratios we can use the Wronskian to get K_{nu} very simply with a lot of code reuse.

Here are some benchmarks for this method.

julia> @benchmark Bessels.besselk_continued_fraction(15.2, 17.0)
BenchmarkTools.Trial: 10000 samples with 700 evaluations.
 Range (min  max):  179.583 ns  276.309 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     179.881 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   180.464 ns ±   3.142 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆█▇▄▁                            ▂▃▁                         ▁
  ███████▇▆▆▆▅▅▅▅▄▃▄▄▄▄▅▅▅▄▄▄▄▄▃▃▂▄▇███▆▃▃▅▂▄▄▄▂▄▂▃▄▃▅▅▅▆▆▇▆▆▅▅ █
  180 ns        Histogram: log(frequency) by time        187 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.


julia> @benchmark SpecialFunctions.besselk(15.2, 17.0)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
 Range (min  max):  435.606 ns  782.409 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     441.081 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   450.682 ns ±  26.869 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃██▆▃▃▁▃▄▂▁▁       ▁                                          ▂
  █████████████████▇▇████▇▇▆▇▇▇▇█▇▇▆▇█▇▆▇▇▇▇▆▆▆▇▆▇▆▆▅▅▄▅▅▅▅▅▅▅▄ █
  436 ns        Histogram: log(frequency) by time        571 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.

So almost 2.5x faster than SpecialFunctions....

O and and here is a plot of the relative errors compared to ArbNumerics.jl
Screen Shot 2022-08-03 at 12 28 49 PM

With a maximum error of..

julia> maximum(out)
1.33399240780568911954050188375962391214648355184706163492721e-15

For reference this more accurate than SpecialFunctions.jl which had a maximum error of 5e-15.

@heltonmc
Copy link
Member Author

heltonmc commented Aug 3, 2022

Some general things going forward.... Do we want to have another branch for large x that would be faster at the cost of more branches? Unsure honestly. We can look more at @cgeoga large argument expansion https://github.com/heltonmc/Bessels.jl/issues/17#issuecomment-1197096678. I have one for besseli that is really fast and accurate in the appropriate range

@oscardssmith
Copy link
Member

That looks great!

@cgeoga
Copy link
Contributor

cgeoga commented Aug 3, 2022

Hey, sorry to have disappeared for a few days! I'm about to enter a pretty swamped period and will probably be slower for the next week or so.

Those times for the intermediate range are very impressive to me. I played a bit more with the Miller recurrence, but I think to get to the tolerances you are going for here it would just be too expensive. With regard to very large x, I'm surprised about the standard asymptotic expansion not being fast enough. With my random choice of (v,x) = (0.3,38.1), if I crank up the order to 10 instead of 6 that was in your test, even without the exponential improvement I get 1 - (...)/(...) = 8.8e-16 or so at a speed cost 16 ns. That suggests a higher x cutoff than I expected, but in my experience the uniform asymptotic expansion is much slower to converge as x gets larger when v is small. With that said, though, the AMOS library (figure 2 here) does not seem to have a special branch for this. The Temme 1975 paper does use one, though.

I like the idea of the code re-use with the besseli series and the Wronskian. That is a strategy that I missed but seems very simple and elegant. I wish I had taken the continued fraction thing more seriously earlier in the project and given that version a whack.

With regard to extra branches, I would think there are enough ns on the table where it would make sense to have several for various specialized cases, like when v = k + 1/2 for integer k (although I'm biased about this obviously). I would also think that a power series approach might be way faster for, say, x<0.1. I don't really know what the computational physics applications are and what kinds of (v,x) you typically use, but in my applications I might hit that case a lot.

@heltonmc
Copy link
Member Author

heltonmc commented Aug 3, 2022

Yes - the miller recurrence is definitely something else to consider! I think that is what Amos is using? So I would imagine that we might be able to slightly faster than what they have but similar accuracies. I do like this approach as the amount of code is much smaller and easier to optimize and diagnose branch cutoffs.

But to your point. The continued fraction approach is much slower when x < 1 so we should use a power series here for sure. I'll do that now.

For the large argument, I'm very happy to include that. Maybe once I merge this you could submit a PR with the large argument branch that you wrote? We can then integrate I think. I do like to lean on the asymptotic expansions just because they are so optimized (and I use them for every single routine) but definitely when x>>nu a different expansion will be better so let's do that.

I think for specific nu like 1/2 it would be worth having an optimized branch. Maybe as a separate PR after sort this all out. I'm also going to work on nu =1/3 because we can use those for Airy functions which are needed as well.... but in time.

@heltonmc
Copy link
Member Author

heltonmc commented Aug 3, 2022

Here is a plot of the relative errors using the uniform asymptotic expansion (with 10 U polynomials). So it looks like it converges fairly well even when nu is small.

Screen Shot 2022-08-03 at 1 48 08 PM

Which is quite fast (not quite as the asymptotic expansion would be for large x though)

julia> @benchmark Bessels.besselk_large_orders(120.0,80.0)
BenchmarkTools.Trial: 10000 samples with 968 evaluations.
 Range (min  max):  79.545 ns  129.347 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     79.718 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.406 ns ±   3.194 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃       ▃                                                   ▁
  ██▅▅▅▅▅▄▅██▃▄▄▆▇▄▄▂▄▂▄▂▃▄▃▃▄▅▄▄▃▄▃▃▃▂▃▄▄▄▃▄▃▅▄▄▅▄▂▄▅▅▄▄▅▃▄▂▄ █
  79.5 ns       Histogram: log(frequency) by time      98.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@cgeoga
Copy link
Contributor

cgeoga commented Aug 3, 2022

Fantastic. This code is sort of on a pedestal for me and I'm nervous to submit an un-clean PR, but I'm very happy to give it a shot. Do you have a series code already? If not, our series code is pretty decent, I think, and I could try to clean it up and submit it for consideration as well in a separate issue or something.

Once this merges I'll definitely submit an asymptotic expansion PR. Do you want the exponential correction in that? The way I wrote it in our code it uses the maximum possible order, but maybe it would be faster for me to hard-code just using the leading term in it or something, if you wanted. Happy to submit it that way and potentially prune it off if the speed cost is too high.

I could also add the half-integer order code from #25 in that same PR if you'd like.

Also, funny about that plot---in my mind, an order of, like, 3 is "large" because of my application. I keep forgetting that you want to work with an order of 50 and stuff.

@heltonmc
Copy link
Member Author

heltonmc commented Aug 3, 2022

So I'd probably pose the following branches.

  1. nu > 25.0 || x > 35.0
    • use the uniform asymptotic expansion
  2. x < ~2.0
    • use the power series
  3. else
  • use continued fraction with wronskian
  • inu and inum1 are generate simultaneously with power series

With the option of a 4th branch for asymptotic x >> nu.

That sounds pretty reasonable to me...?

@heltonmc
Copy link
Member Author

heltonmc commented Aug 3, 2022

Definitely don't feel that way. The code is very good so I think we can just drop in and clean some stuff up! But let's plan on that when you have time submit the asymptotic expansion for large x and we can go through it. The exponential correction sounds reasonable but we should have a pretty specific cutoff where we want to use it and set the order to a specific value for the entire branch.

I do not have a power series for besselk that I am super happy with so it might be great to just reuse what you have. I would like to probably include that in this PR though. I can coauthor a commit if you want with that implementation here?

But yes if you want half orders please go ahead and submit that (preferably as a separate PR)!

Haha about the orders! In my research I deal with a lot of infinite series over the order so we need calculations for very high (nu>1e4) numbers....

Co-authored-by: Chris Geoga <[email protected]>
@cgeoga
Copy link
Contributor

cgeoga commented Aug 3, 2022

I appreciate that! I see you committed the code from the repo directly, but here is a slightly cleaned up version that I think is a slightly better style fit:

function _besselk_ser(v, x)
  (maxit, tol) = (100, 1e-15)
  T = promote_type(typeof(x), typeof(v))
  # precompute a handful of things:
  xd2  = x/T(2)
  xd22 = xd2*xd2
  half = one(T)/T(2)
  # (x/2)^(±v). Writing the literal power doesn't seem to change anything here,
  # and I think this is faster:
  lxd2     = log(xd2)
  xd2_v    = exp(v*lxd2)
  xd2_nv   = exp(-v*lxd2)
  # use the gamma function a couple times to start:
  gam_v    = gamma(v)
  gam_nv   = gamma(-v)
  gam_1mv  = -gam_nv*v # == gamma(one(T)-v)
  gam_1mnv = gam_v*v   # == gamma(one(T)+v)
  (gpv, gmv) = (gam_1mnv, gam_1mv)
  # One final re-compression of a few things:
  _t1 = gam_v*xd2_nv*gam_1mv
  _t2 = gam_nv*xd2_v*gam_1mnv
  # A couple series-specific accumulators:
  (xd2_pow, fact_k, floatk, out) = (one(T), one(T), zero(T), zero(T))
  for _j in 0:maxit
    t1   = half*xd2_pow
    tmp  = _t1/(gmv*fact_k)
    tmp += _t2/(gpv*fact_k)
    term = t1*tmp
    out += term
    abs(term/out) < tol && return out
    # Use the trick that gamma(1+k+1+v) == gamma(1+k+v)*(1+k+v) to skip gamma calls:
    (gpv, gmv) = (gpv*(one(T)+v+floatk), gmv*(one(T)-v+floatk)) 
    xd2_pow *= xd22
    fact_k  *= (floatk+1)
    floatk  += one(T)
  end
  throw(error("$maxit iterations reached without achieving atol $tol."))
end

Note that the modified version in our repo is not exp(x)*besselk(v,x), it is (x^v)*besselk(v, x) (which shows up in the Matern covariance), so that probably isn't useful to you here and I cut it out.

The issue I'm having, though, is that my 1 - (...)/(...) errors are really about 2e-14 and I can't do much better. Does anything here stand out to you as problematic?

EDIT: the code is a bit obfuscated, so for reference I'm using equation 3.2 from here.

src/besselk.jl Outdated Show resolved Hide resolved
src/besselk.jl Outdated Show resolved Hide resolved
src/besselk.jl Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2022

Codecov Report

Merging #29 (dfbd16d) into master (a09b94d) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
+ Coverage   99.06%   99.10%   +0.04%     
==========================================
  Files          14       14              
  Lines        1172     1231      +59     
==========================================
+ Hits         1161     1220      +59     
  Misses         11       11              
Impacted Files Coverage Δ
src/U_polynomials.jl 99.24% <100.00%> (-0.04%) ⬇️
src/besseli.jl 100.00% <100.00%> (ø)
src/besselk.jl 100.00% <100.00%> (ø)
src/gamma.jl 100.00% <100.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

src/besselk.jl Show resolved Hide resolved
src/besseli.jl Show resolved Hide resolved
@heltonmc heltonmc marked this pull request as ready for review August 3, 2022 23:47
@heltonmc
Copy link
Member Author

heltonmc commented Aug 4, 2022

I think this is ready to merge now. I think the last thing is still the idea of potentially returning complex numbers for real (negative) inputs. Mathematica will return complex values for real inputs but SpecialFunctions.jl will not.

julia> using SpecialFunctions

julia> bessely(1.2, -0.2)
ERROR: DomainError with -0.2:
`x` must be nonnegative.
Stacktrace:
 [1] bessely(nu::Float64, x::Float64)
   @ SpecialFunctions ~/.julia/packages/SpecialFunctions/NBIqR/src/bessel.jl:556
 [2] top-level scope
   @ REPL[2]:1

julia> bessely(1.2, -Complex(0.2))
3.8701259814738727 - 2.904048804819615im

Matlab will output complex numbers for real inputs.

>> bessely(1.2, -0.2)

ans =

   3.8701 - 2.9040i

>> bessely(1.2, -complex(0.2))

ans =

   3.8701 - 2.9040i

I'm wondering what the best is here?

@oscardssmith
Copy link
Member

Julia's general practice is to throw a domain error in cases like this (and require the input to be complex)

@heltonmc
Copy link
Member Author

heltonmc commented Aug 4, 2022

That sounds good to me. One issue is that we don't actually support arguments for nonzero imaginary parts. So we would probably have to have a additional checks for nonzero complex terms.

Edit: I'm wondering if that would be a little confusing to allow for complex arguments but Im(x) = 0 for now...

@oscardssmith
Copy link
Member

Honestly, I think we can not deal with this until we start on more general complex support.

@heltonmc
Copy link
Member Author

heltonmc commented Aug 4, 2022

Ya that sounds good to me.... I'll make an issue about it just so people are aware and put it on the updated readme!

@heltonmc heltonmc merged commit 497b118 into master Aug 4, 2022
@heltonmc heltonmc deleted the besseli_all_orders branch August 4, 2022 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants